Dataset description
There are two submissions: 10267 & 10270.
- In each submission, 2390 families with .vcf files are included.
- For each family, two vcf files are provided,
- one named “sorted”.
- the other named “annotated”.
Submission 10267
- For files named “sorted”,
- 852 families without GL/PL information
- 1538 families with valid GL/PL information
- 310 Trios
- 1228 families with >=1 siblings
- For files named “annotated”,
- 1096 families without GL/PL information
- 1294 families with valid GL/PL information
- 309 Trios
- 985 families with >=1 siblings
Note that for FID:13562, there is no father information in the .vcf file. Also, all families with valid GL/PL information from files named “annotated” are included from files named in “sorted”.
Submission 10270
- For files named “sorted”, there is no GL/PL information.
- For files names “annotated”,
- 703 families without valid GL/PL information
- including 13 families with variants < 2000.
- 1687 families with valid GL/PL information
- 292 Trios
- 1395 families with >=1 siblings
Combined
Note that combing 10267 & 10270, there are 2206 families with complete vcf files.
- 415 Trios
- 1791 families with >=1 siblings
Call de novo mutations
Triodenovo was used to call de novo mutations:
- Only variants with GL/PL information were retained.
- Families were splitted to Parents-Offspring trios.
- Filters: --minDP 7 --minDepth 10 and other default options
- Post filters (referred to Homsy et al. 2015 Science):
- For offsprings: a minimum 10 total reads, 5 alternate allele reads, and a minimum 20% alternate allele ratio if alternate allele reads ≥10 or, if alternate allele reads is <10, a minimum 28% alternate ratio
- For parents: a minimum depth of 10 reference reads and alternate allele ratio <3.5%
The scripts are stored in /scratch/90days/uqywan67/auti_proj/SSC/scripts/call_deno.R
Annotation
- ANNOVAR was used to annotate refGene and allele frequencies.
- hg19refGene, exac03nonpsy, gnomad_exome211 databases were used.
- Based on annotation, further filtered DNMS:
- exonic or canonical splice-site variant
- MAF <= 0.001 in non-psychiatric subsets of ExAC (Header: ExAC_nonpsych_ALL in ANNOVAR), and in control samples of gnoMad databases (Header: controls_AF_popmax in ANNOVAR).
- Gene-level pLI for PTVs was downloaded from ExAC
- MPC scores for missense variants were annotated using VEP.
DNMs summary
After applying filters, a total of 4136 DNMs were found in 1758 families with 2430 offsprings.
- 3378/4136 (81.7%) DNMs were the same with published SSC DNMs from Krumm et al. 2015 and Iossifov et al. 2014.
- 273 trio-families (with 440 DNMs) and 1485 quads-families (with 3696 DNMs, including 1855 DNMs in 1118 probands and 1841 DNMs in 1039 siblings).
- 3592 DNMs in 2074 males and 594 DNMs in 356 females.
- 2295 DNMs in 1391 probands and 1841 DNMs in 1039 siblings.
- 2792 DNMs were not presented in ExAC, 2861 DNMs were not presented in gnoMad, 2577 DNMs were not presented in both datasets.
DNM counts
Note that a cutoff 10 were used to exclude individuals with DNM counts > 10, which corresponding to 99% quantiles.

DNM mutation types

pLIs for PTVs

MPC scores for missense variants

DNMs in quads-familiy
- A total of 3696 DNMs were observed in 1485 quads-families
- 1855 DNMs in 1118 probands and 1841 DNMs in 1039 siblings
- 3222 DNMs in 1886 males and 474 DNMs in 271 females.
DNM counts

DNM mutation types

pLIs for PTVs

MPC scores for missense variants

Burden test analysis